YORO - Lightweight End to End Visual Grounding
نویسندگان
چکیده
We present YORO - a multi-modal transformer encoder-only architecture for the Visual Grounding (VG) task. This task involves localizing, in an image, object referred via natural language. Unlike recent trend literature of using multi-stage approaches that sacrifice speed accuracy, seeks better trade-off between accuracy by embracing single-stage design, without CNN backbone. consumes language queries, image patches, and learnable detection tokens predicts coordinates object, single encoder. To assist alignment text visual objects, novel patch-text loss is proposed. Extensive experiments are conducted on 5 different datasets with ablations design choices. shown to support real-time inference outperform all this class (single-stage methods) large margins. It also fastest VG model achieves best speed/accuracy literature. Code released (Code available at https://github.com/chihhuiho/yoro ).
منابع مشابه
End-to-end esophagojejunostomy versus standard end-to-side esophagojejunostomy: which one is preferable?
Abstract Background: End-to-side esophagojejunostomy has almost always been associated with some degree of dysphagia. To overcome this complication we decided to perform an end-to-end anastomosis and compare it with end-to-side Roux-en-Y esophagojejunostomy. Methods: In this prospective study, between 1998 and 2005, 71 patients with a diagnosis of gastric adenocarcinoma underwent total gastrec...
متن کاملA Lightweight Secure SIP Model for End-to-End Communication
Session Initiation Protocol (SIP) is a signaling standard approved by IETF for real-time multimedia session establishment. Increasingly wide deployment brings much concern on SIP security. Current solutions for end-to-end signaling security either require user-side powerful performance support for heterogeneous security mechanisms, or assume that trust relationship is transitive and static. Yet...
متن کاملComparison of nerve repair with end to end, end to side with window and end to side without window methods in lower extremity of rat
Abstract Background : Although, different studies on end-to-side nerve repair, results are controversial. The importance of this method in case is unavailability of proximal nerve. In this method, donor nerves also remain intact and without injury. In compare to other classic procedures, end-to-side repair is not much time consuming and needs less dissection. Overall, the previous studies i...
متن کاملTST/BTD: An End-to-End Visual Recognition System
We describe a visual recognition system operating on a hand-held device. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system scores objects in the field of view based on their ranking. Severe resource constraints have prompted a re-evaluation of existing algorith...
متن کاملFault Identification using end-to-end data by imperialist competitive algorithm
Faults in computer networks may result in millions of dollars in cost. Faults in a network need to be localized and repaired to keep the health of the network. Fault management systems are used to keep today’s complex networks running without significant cost, either by using active techniques or passive techniques. In this paper, we propose a novel approach based on imperialist competitive alg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-25085-9_1